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ABSTRACT 

Transcription factors (TFs) regulate gene expression 
by binding to sliort DNA sequence motifs, yet tlieir 
binding specificities alone cannot explain how 
certain TFs drive a diversity of biological processes. 
In order to investigate the factors that control the 
functions of the pleiotropic TP STATS, we studied 
its genome-wide binding patterns in four different 
cell types: embryonic stem cells, CD4^ T cells, 
macrophages and AtT-20 cells. We describe for the 
first time two distinct modes of STATS binding. First, 
a small cell type-independent mode represented by a 
set of 35 evolutionarily conserved STATS-binding 
sites that collectively regulate STATS's own func- 
tions and cell growth. We show that STATS is re- 
cruited to sites with E2F1 already pre-bound before 
STATS activation. Second, a series of different tran- 
scriptional regulatory modules (TRMs) assemble 
around STATS to drive distinct transcriptional 
programs in the four cell types. These modules rec- 
ognize cell type-specific binding sites and are 
associated with factors particular to each cell type. 
Our study illustrates the versatility of STATS to 
regulate both universal- and cell type-specific func- 
tions by means of distinct TRMs, a mechanism that 
might be common to other pleiotropic TFs. 

INTRODUCTION 

The precise spatio-temporal regulation of gene expression 
programs determines an organism's development and the 



interaction with its environment. Transcription factors 
(TFs) control this process by binding to short DNA se- 
quences (typically 6-8 bp), yet their binding specificities 
cannot explain the various cell type-specific functions of 
many TFs. Protein binding microarrays have shown that 
members of TF families such as homeodoniains bind to 
very similar sequences, which therefore cannot account on 
their own for the enormous diversity of functional roles of 
homeodomain TFs during animal development (1,2). 
Potentially, cell type specificity emerges from the interplay 
of TF DNA sequence specificity, co-factors and epigen- 
etics (3). However, despite vast efforts to understand the 
mechanisms that determine cell type-specific TF activity, 
the exact mechanisms continue to remain frustratingly 
elusive. A number of studies have shown that key TFs 
associate locally with co-activators to constitute 'tran- 
scriptional regulatory modules' (TRMs) that endow the 
key TF with cell type-specific functions. An important 
example was provided in embryonic stem cells (ESCs), 
where TFs assemble around the core heterodimer 
SOX2-OCT4 and NANOG (4). In hematopoietic progeni- 
tor cells, the TRM centers around GATA2, RUNXl and 
SCL/TALl (5), whereas in developing B ceUs the TRM 
clusters around E2A, EBFl and FOXOl (6). Finally, in 
trophectoderm stem cells, the TF core around which the 
TRM assembles includes SMARCA4, EOMES, 
TCFAP2A, GATA3 and ETS2 (and possibly STAT3 
too) (7). Although experimentally characterized TRMs 
are very informative as to the co-activators that key TFs 
need to associate with to perform their biological func- 
tions, these TRM models have not yet been able to 
provide an explanation for how pleiotropic TFs bring 
about functional specificity in distinct cell types. 
Examples for the pleiotropic functions of TFs are 
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as follows: (i) the ESC factor S0X2 is also active in neural 
progenitor cells (8), (ii) the essential hematopoietic factor 
SCL/TALl is also robustly expressed in neural progenitor 
cells, (iii) the B-cell development factor FOXOl is known 
to regulate adipocyte differentiation (9) and (iv) the troph- 
ectoderm stem-cell factor GATA3 is crucial at various 
stages of CD4^ T-cell development (10). 

Therefore, a fundamental question in transcriptional 
regulation is how a given TF can perform highly divergent 
and at the same time crucial functions across distinct ceU 
types (11). To address this problem, we set out to investi- 
gate the mechanisms that enable STATS to regulate dis- 
tinctive gene sets leading to diverse biological outcomes in 
various cell types. STAT3 has been profiled by ChlP-seq 
in multiple cell types, including ESCs (4), CD4^ T ceUs 
(12,13), macrophages (14) and AtT-20 corticotroph cells 

(15) . Crucially, for the dissection of cell type-specific func- 
tions, STAT3 has radically different roles in each one of 
these cell types: in ESCs, STAT3 maintains pluripotency 

(16) , whereas in CD4^ T cells STAT3 drives the differen- 
tiation toward Thl7 cells (13,17) and is also required for 
Th2 cells (18). In macrophages, STAT3 is essential for the 
initiation of the anti-inflammatory response mediated by 
IL-10 (19,20), and in AtT-20 corticotroph cells, STAT3 
promotes adrenocorticotropic hormone production as 
part of the hypothalamo-pituitary-adrenal axis in 
response to stress and inflammation (15,21). Clearly, 
these diverse functions imply that STAT3 is able to 
target different enhancers to regulate distinct genes de- 
pending on the biological context. Other advantages of 
using STAT3 as a model to investigate TF functional spe- 
cificity in the four distinct cellular types described earlier 
are as follows: (i) STAT3 is an essential regulator in these 
cell types and cannot be replaced by other factors; 
(ii) STAT3 is activated upon induction by a cytokine 
and thus constitutes a natural switch that produces 
easily distinguishable outcomes and (iii) upon activation, 
STAT3 initiates a measurable response that is either a 
developmental program or a response to an environmental 
stimulus. 

Here, we analyze genome-wide STAT3 binding data 
from ChlP-seq libraries profiled in ESCs, CD4"^ T cells, 
macrophages and AtT-20 cells and show that STATS has 
two modes of binding: (i) a small number of 
STATS-binding sites that are common to all four cell 
types analyzed which regulate a core set of genes that 
are pre-bound by E2F1 and which encode a 
self-regulatory loop for STATS and (ii) the larger sets of 
STATS-binding events that are cell type-specific and there- 
fore responsible for the distinct biological outcomes of 
STATS in the various ceU types. Moreover, by integrating 
data on predicted TF-binding sites, protein-protein inter- 
actions and gene expression, we built TRM models that 
predict the unique associations of STATS with distinct sets 
of co-factors and which explain how STATS directs both 
its cell type-independent and ceU type-specific functions. 
This is the first example of a TF having both cell type- 
independent and ceU type-specific functions mediated by 
distinct TRMs, a modus operandi that might be shared by 
other pleiotropic TFs. 



MATERIALS AND METHODS 

ChlP-seq read mapping and peak calling 

ChlP-seq reads were mapped to the mouse genome (mm9) 
using Bowtie (vO.12.7) (22) with the setting 'best' and with 
reads mapping to more than one genomic location being 
excluded (— m). Peak discovery was performed using 
MACS (vl.4.1) (2S) using the parameters band- 
width = 200 and genomesize = mm. The m-fold param- 
eter was adjusted until MACS could find ~ 1000-2500 
high-quality peaks to construct a model. The P-value for 
each ChlP-seq library was determined by increasing the 
P-value from 1 x 10~'° until the number of peaks dis- 
covered by chance alone (false positives) was close to 
1% or the P-value reached 1 x 10~^. Detailed genome 
mapping statistics and the full hst of settings used by 
MACS for all of the ChlP-seq data used in this article 
are described in Supplementary Table SI. 

Gene Ontology (GO) on the ChlP-seq lists was done 
using GREAT (v2.0.1) (24) using the whole genome as a 
background. Statistical analysis was performed using 
SciPy and data visualization used matplotlib, pycairo, R 
and glbase (https://bitbucket.org/oaxiom/glbase/wiki/ 
Home/). For evolutionary conservation analysis, all 
pre-computed phastCons scores were obtained from the 
UCSC genome browser. 

The random backgrounds used for motif enrichment 
analysis or for comparison were generated by randomly 
samphng 1 or 10% of the appropriate control ChlP-seq 
library. Sequence hbraries were made unique, allowing 
only a single read per genomic position prior to sampling 
random sites. This removed repeats that typically attract a 
large number of sequence tags and give spurious regions of 
overlap. For the 'shared overlap', 'any two cell types' and 
'any three cell types', the backgrounds were combined in 
equal proportions relative to the size of the contribution of 
the cell-type STATS binding to the overlap. 

Gene expression data 

Expression profiles were obtained from the Gene Expres- 
sion Omnibus for ESCs (GSE27708) (25), IL-21 -treated 
anti-CDS/CD28-activated-CD4+ T cells (GSE19198) 
(12), IL-6-treated anti-CDS/CD28-activated-naive CD4+ 
T cefls (GSE21671) (IS), IL-lO-treated peritoneal 
exudate cells macrophages (GSES15S1) (14), IL-6-treated 
liver cells (GSE21060) (26) and AtT-20 cells (GSE 19042) 
(21). For ESCs, AtT-20 and CD4+ T cells, raw data were 
processed using the BrainArray custom CDFs (27), while 
the RNA-seq macrophage data for PEC macrophage 
RNA-seq, transcripts were quantified using RSEM (28) 
and annotated to Ensembl genes (release 66). 

Motif discovery and identification of TRMs 

HOMER (vS.6) (29) was used for de novo motif discovery 
with default parameters, with ±200 bp around the peak 
summits of the list of STATS binding peaks and a random 
background sampling of 1% of the appropriate control 
ChlP-seq library. TRMs were identified by integrating 
motif enrichment, gene expression and protein-protein 
interaction information. De novo motifs discovered by 
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HOMER that did not resemble STAT3 or a STAT3 
half-site were searched again for enrichment using 
FIMO (30) to remove motifs with a z-score < 5.0 in all 
STAT3 binding hsts. De novo motifs detected by 
HOMER were annotated with Tomtom (MEME suite) 
(matching motifs with a q<Q.Q5 were kept) (31) using a 
library of motifs obtained from the JASPAR and 
UniPROBE databases (32,33). Known motifs were 
mapped to human genes and pairwise similarities among 
motifs were computed with Tomtom to determine clusters 
of TPs with similar binding preferences. The original gene 
annotations for the motifs were mapped to the human 
homologs using the Biomart tool from Ensembl. These 
mapped genes were filtered by expression in the appropri- 
ate ceU type to reduce redundancy. Based on the distribu- 
tion of intensities, genes were considered not expressed if 
the normalized intensity value was <5.2 for ESCs, <4.1 in 
CD4^ T ceUs or <3.9 in AtT-20 cells. For the macrophage 
RNA-seq data, those transcripts with a transcripts- 
per-milhon value <1.0 were disregarded. The human 
protein-protein interaction network was obtained from 
the BioGRlD database (34) and nodes were removed for 
non-expressed genes in each of the data sets. Additionally, 
the nodes UBC and SUM02 were also removed as they 
connect to 60 and 8% of the entire interactome, respect- 
ively. Full details on our method for reconstructing TRMs 
will be published elsewhere. 

ChlP-qPCR of STAT3-bound sites in macrophages 

PEC macrophages were purified from pooled 6- to 8- 
week-old male C57B16JcL mice (CLEA Japan) and 
treated with lL-10 (R&D Systems). Formaldehyde 
cross-hnked chromatin was extracted and subjected to 
Chip using antibodies to STAT3 (SantaCruz, sc-482), 
E2F1 (Millipore, 05-379) and GFP (Santa Cruz, 
sc-8334), as previously described (14). qPCR was per- 
formed on an ABI 7900, using SYBR qPCR mix 
(TOYOBO) according to the manufacturer's instructions. 
Primers used in this study are described in Supplementary 
Table S2. 

RESULTS 

A catalogue of STAT3 genomic binding sites in ESCs, 
CD4^ T cells, macrophages and AtT-20 cells 

To date, the genome-wide binding pattern of STAT3 has 
been reported by ChlP-seq in ESCs (4), CD4+ T cells (12), 
Thl7 cells (13), macrophages (14) and AtT-20 corti- 
cotroph cells (15). Although each publication provides 
an ahgnment to the genome and a set of characterized 
peaks, these were performed at different times where the 
sequencing reads were ahgned to different mouse genome 
assemblies (mm8 and mm9), using various software tools 
(ELAND, Ace View and bowtie) and a variety of peak 
discovery algorithms (MACS, ChlPseq and Ace View). 
In order to compare the various ChlP-seq hbraries, 
these and their corresponding control libraries (raw 
sequence reads) were uniformly reanalyzed, except for 
the STAT3 ChlP-seq library prepared in Thl7 cells as it 
lacks a paired control hbrary (13). Nevertheless CD4^ T 



cells contain small numbers of Thl7 cells, meaning that 
the CD4"^ T-cell ChlP-seq hbrary will contain many 
Thl7-specific STAT3-binding sites. For each cell type, 
ChlP-seq library replicates were merged into a single 
fastq file and bowtie (22) was used to align the reads to 
the mm9 version of the mouse genome. The peak discov- 
ery tool used was MACS (23), which reports the number 
of peaks discovered by chance alone by reversing the ex- 
perimental and control hbraries. At the default cutoff 
\p = 1 X 10"-^), MACS has a tendency to inflate the 
number of peaks reported for larger sequence hbraries at 
the expense of increasing the number of false positives. To 
correct for this bias, we adjusted the MACS P-value cutoff 
to limit the number of peaks discovered by chance alone 
(false positives) to ~1% (Supplementary Figure SI and 
Supplementary Table SI). Thus, we report 2651 (ESCs), 
5152 (CD4+ T cells), 1724 (macrophages) and 7982 
(AtT-20) peaks. Of the original sets of peaks defined in 
the respective publications, our new peak hsts contain 
66% of the ESCs peaks, 81% of the CD4+ T-cell peaks, 
87% of the macrophage peaks and 100% of the AtT-20 
peaks. Although the majority of the peaks that were 
missed are lower ranked peaks (Supplementary Figure 
S2), we also report new additional peaks in CD4^ T 
cells (756), ESCs (245), macrophages (372) and AtT-20 
cells (4881). Similarly, most of these new peaks are 
lower ranked peaks by fold enrichment of the experimen- 
tal libraries over the control libraries (Supplementary 
Figure S2). 

STAT3 regulates a core set of genes across all four 
cellular types 

STAT3 peaks were defined by extending the summits 
determined by MACS 200 bp either side of each summit, 
and overlapping peaks were merged by taking the mid- 
point between the two summits. Whenever peaks overlap- 
ped, they tended to be close (Figure lA) and only 35 peaks 
were found to be common to all four libraries (Figure IB). 
These 35 peaks overlapping across all four hbraries were 
highly ranked STAT3-binding sites, with one-half of these 
35 overlapping peaks being hsted in the top 1000 STAT3 
peaks in all four libraries (Figure IC). Moreover, a Monte 
Carlo simulation of the overlap suggested that the number 
of overlapping peaks that would be expected among all 
four libraries by chance alone is zero (expected 
overlapping sites: 0.002 ± 0.045) (Figure IB). Therefore, 
the 35 STAT3-binding events shared by aU four ChlP-seq 
hbraries are not random, while most STAT3-binding 
events are highly specific to each cell type (Supplementary 
Figures S3 and S4; Supplementary Table S3). Conser- 
vation of transcriptional regulation can also be 
gene-centric, where regulatory TF-binding sites do not 
overlap but stiU regulate the same nearby gene. This has 
been reported for SMAD3 (35) and during evolution for 
CEBPA (36). However, we did not find robust evidence 
for systematic gene-centric regulation as the number of 
gene-centric observations was close to the values 
expected by chance alone (Supplementary Figure S5). 

The distribution of STAT3-binding events relative to 
gene locations is typical of other TF ChlP-seq hbraries, 



2158 Nucleic Acids Research, 2013, Vol. 41, No. 4 



B 



— 0.007 

c 

^ 0.006 



< 

>^ 0.003 
'in 

C 0.002 

Q 

0.001 



- Macrophages vs ESCs 

- AtT-20 cells vs CD4+ T cells 

- ESCs vs CD4+ T cells 

- Macrophages vs CD4+ T cells 

- AtT-20 cells vs ESCs 
Macrophages vs AtT-20 cells 




STAT3, IL-10 
PEC macrophages 

STAT3, IL-21, 
CD4+T cells 



Distance between two peaks (bp) 




— CD4+ T cells 

— ESCs 

— AtT-20 cells 

— Macrophages 



Any two cell-types 
{750 unique sites, 42 expected) 

Shared overlap 
(35 unique sites, 0 expected) 

Any three cell-types 
(188 unique sites, 1 expected) 



STAT3, LIF 
AtT-20 cells 



STATS, LIF 
Embryonic stem cells 




5000 6000 



Ranl< in peak list 
ESCs only CD4"^ T cells only Macrophages only 



AtT-20 cells only 




Any two cell-types Any three cell-types 



Shared Overlap 




0) Ul 



I -200 kb to -100 kb 
I -100 kb to -50 kb 
I -50 kb to -10 kb 

-10 kb to 0 kb 

0 kb to 10 kb 
I 10 kb to 50 kb 
I 50 kb to 100 kb 
I 100 kb to 200 kb 

Gene desert 
I Random background 



Figure 1. STAT3 binds to a small but significant 'shared overlap' of binding sites in divergent cell types, but is otherwise strongly cell type specific. 
(A) Summary of the distances between the overlapping peaks for each pair of STAT3 ChlP-seq libraries. The distance was measured as the number 
of base pairs between the summits of the two overlapping peaks. (B) Overlap of the binding peaks in three STAT3 libraries culled from mouse ESCs, 
CD4^ T cells, macrophages and AtT-20 cells. Peaks were considered to overlap if their peak summits were within 200 bp of one another. The overlap 
was simulated by generating lists of faux ChlP-seq peaks followed by the assessment of their overlap (this was performed 1000 times to generate the 
number of overlaps expected by chance-value listed in brackets). (C) The 'shared overlap' sites are more likely found in the most highly ranked 
STAT3-binding sites. STAT3-binding sites were ranked by fold enrichment and then the cumulative overlap of STAT3 peaks appearing in all four 
cell types was plotted for ESCs, CD4* T cells, macrophages and AtT-20 cells. (D) Genome distributions of STAT3-binding sites orientated with 
respect to the nearest gene to the STAT3-binding site. Coloured bars describe the distance from the STAT3-binding sites to the nearest TSS, as 
described in the key. The gray regions denote a random background and represent the expected distribution of peaks were the binding sites randomly 
distributed across the sequenceable genome. 
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with one-third of STAT3 peaks located within 10 kb of the 
nearest TSS (Figure ID and Supplementary Table S4). 
However, the 'shared overlap' of 35 STATB-binding 
events across all four cellular types has a very strong 
bias toward the TSS, with 30 of 35 (-86%) STAT3 
peaks being located within lOkb of the TSS. As proximity 
to the TSS has been linked with the control of gene ex- 
pression (4,37), this suggests that the 35 STAT3 peaks 
likely regulate the expression of the genes they lie closest 
to. Many of these genes are essential for STAT3 function, 
including Stat3 itself, Socs3, Bcl3 and Ptpnl (Figure 2A). 
Not surprisingly the shared 35 STAT3-binding events 
show a much greater degree of evolutionary conservation 
than any set of cell type-specific STAT3 peaks (Figure 2B). 
Most of the 35 STAT3-binding events shared across all 
four cellular types are located within 10 kb of a TSS, 
although the assignment between TF-binding events and 
gene regulation is still an open question as no generally 
applicable model has yet been described. The association 
by proximity of TF-binding events with the genes nearby 
shows that typically only 10-20% of such genes are dif- 
ferentially expressed (10,14,38). To investigate the effects 
of the 35 STAT3-binding events on the expression of the 
closest genes, we analyzed expression data for the four 
cellular types analyzed here plus two additional conditions 
where STAT3 is also activated by a cytokine: naive CD4^ 
T cells stimulated by IL-6 (13) and liver cells stimulated by 
IL-6 (for which the STAT3 genome-wide binding profile is 
not known) (26). Gene expression values of the 
cytokine-stimulated cells were combined and ranked by 
fold-change relative to controls, while the expression 
data from ESCs were inverted (as the withdrawal of LIF 
causes ESCs to differentiate). Remarkably, about one-half 
of the genes are up-regulated upon cytokine stimulation in 
all six conditions (Figure 3B) and strikingly only a few 
genes are down-regulated. This indicates that a core unit 
of STAT3 regulation occurs in all these biological contexts 
and cellular types, including the well-characterized targets 
of STAT3, Socs3 and Bcl3, which are up-regulated in all 
six conditions. 

The 35 STAT3-binding events common to all four cell 
types regulate a diverse set of genes involved in the main- 
tenance of specific cellular activities. GO analysis using 
GREAT (24) reported an over-representation of signal 
transduction pathways related to STAT3 (Figure 3A). 
The promoters of Stat3 and Statl are both bound and 
induced by the recruitment of STAT3 in all four cell 
types (Figure 3B). Moreover, it is remarkable to find 
that three genes involved in three different pathways 
that negatively regulate JAK-STAT signaling are 
induced by STAT3, suggesting that STAT3 is acting to 
regulate its own activity. These include PTPIB (Ptpnl), 
a protein tyrosine phosphatase reported to dephospho- 
rylate JAK2 and TYK2 and also phospho-STAT3 (39). 
Additionally, SOCS3 is part of the ubiquitin pathway 
that negatively regulates JAK-STAT signaling by 
degrading both JAKs and STATs (40). A third mechanism 
of negative feedback is the suppressive function of the 
RNA-binding protein Zfp36 (Tristetraprolin) which nega- 
tively regulates IL-10 signahng in macrophages (41). 
Zfp36 could thus be playing an identical role in all four 



cell types described here. The remaining genes include over 
a dozen TFs, such as Stat3, Statl, Bcl3, Bcl6, Tcf4, Cic, 
Sbnol and the AP-1 family members Fos and Junb 
(GO:0003676: 'nucleic acid binding', GREAT FDR 
(/ = 3.6 X 10~^), which deserve further characterization. 
In addition to these, our STAT3 target set comprises 
several genes encoding proteins involved in DNA replica- 
tion and repair, protein translation, cytoskeletal reorgan- 
ization and protein trafficking and notably six genes that 
encode proteins involved in metabohsm (summarized in 
Supplementary Table S5). Our findings on the STAT3 
transcriptional program common to all four cellular 
types indicate that STAT3 estabhshes its own regulatory 
network by the following: (i) perpetuating its own tran- 
scription, (h) being a master regulator of other TFs 
working downstream of it, (iii) stimulating the transcrip- 
tion of cytoplasmic enzymes that control STAT3's activity 
and (iv) ensuring an efficient and robust cellular division 
program and the maintenance of a stable cell type. The 
STAT3 transcriptional program involves many levels of 
cellular control from basic DNA replication and chroma- 
tin remodeling, to the cell metabohc pathways producing 
key metabohtes needed for increasing the transcriptional 
and translational processes essential in cell division and 
maintenance. 

A distinct TRM defines cell type-independent STAT3 
binding across all four cellular types 

Since STAT3 binds to 35 identical sites across four distinct 
cellular types, it is reasonable to assume that it does so by 
assembhng around a TRM that is common to all cellular 
types. To reconstruct the putative TRM that directs the 
expression of the genes regulated by the 35 STATB- 
binding events, we integrated over-represented TF- 
binding sites co-occurring with STATB-binding sites, 
protein-protein interaction data and expression data 
(as detailed in the 'Materials and Methods' section). The 
resulting TRM (Figure 4A and Supplementary Figure S6) 
is unique to the 35 STATB-binding events shared by all 
cellular types and contains a number of co-factors and 
other proteins that are known to bind to STATB experi- 
mentally and whose corresponding genes are expressed in 
all four cellular types. The co-TFs that appear to work 
together with STATB in this TRM include many 'general' 
TFs known to operate in a variety of biological contexts. 
The cell type-independent TRM contains the TFs MYC, 
E2F1 and KLF4, all of which have been profiled by 
ChlP-seq in ESCs (4). These ChlP-seq hbraries were 
re-analyzed as before (full details included in Supple- 
mentary Table SI) and we determined the co-occupancy 
of the 35 STATB-binding events by MYC, E2F1 and 
KLF4 in ESCs. We found that 34 of 35 (97%) of the 
STATB peaks are co-occupied (within 800 bp of each 
other) by n-MYC, E2F1 or KLF4 and often by several 
of them (Figure 4B and Supplementary Figure S7), but 
only 19 of 35 (54%) are co-occupied by one of the ESC- 
specific factors ESRRB, SOX2, OCT4 or NANOG 
(Figure 4B). It must be noted that co-occupancy is not a 
factor of the number of peaks of the ChlP-seq hbraries as 
ESRRB has the largest number of peaks (56 136) as 
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Figure 3. The shared overlap STAT3-binding sites co-regulate a set of key genes important for STAT3 function in multiple cell types. 
(A) Significantly over-represented terms from the 'Pathway Commons' category for the genes associated with the shared overlap. Shown here are 
the top five terms only. A significant 9- value of 0.05 is represented by a solid black line and a (/-value of 0.01 by a dotted gray line. (B) The 
expression of the closest genes within 200 kb of the shared overlap was measured in a series of gene expression microarray data sets which show the 
activation of STAT3 or the loss of STAT3 activity (LIF withdrawal from ESCs). Expression data are reversed in ESCs for clarity, but is otherwise 
down-regulated upon removal of LIF whereas all other treatments show up-regulation in response to cytokine stimulation, /"-values are from a 
Wilcoxon test between the treated and untreated conditions: CD4^ T cells treated with IL-21 (P = 5.59 x 10"^) {GSE19198), naive CD4^ T cells 
treated with IL-6 (P = 2.45 x 10"^) (GSE21671), ESCs upon withdrawal of LIF from the medium {P = 0.027) (GSE27708), peritoneal macrophages 
stimulated with IL-10 {P = 2.14 x 10""*) (GSE31529), AtT-20 cells treated with LIF {P = 1.00 x 10"") (GSE19042) and liver cells stimulated with IL-6 
(P = 0.032) (GSE21060). Genes marked in green do not have a corresponding probe on the microarray. 



opposed to E2F1 (11 448 peaks) (Supplementary Table 
SI). Finally, for a limited number of STAT3-binding 
sites in the shared overlap, we also probed their occupancy 
in IL-10 treated macrophages by ChlP-qPCR (Figure 4C). 
As expected, STAT3 is specifically recruited to all of the 
14 sites we probed by ChlP-qPCR and remarkably E2F1 
is not only bound at these same sites but is actually 
pre-bound at these sites close to STAT3 binding (with 
the exception of Diapl, Figure 4C). STAT3 is therefore 
specifically recruited to genomic loci that already have 
E2F1 bound. This may explain why several of these 
genes show very rapid induction of expression upon stimu- 
lation with IL-10 (14,42) since they are poised for expres- 
sion by the presence of E2F1. In summary, STAT3 
appears to use a TRM, binding close to E2F1 and 
possibly MYC, to regulate a core set of genes that tune 



the JAK-STAT pathway and to promote cell proHferation 
while counteracting differentiation. 

Cell type-specific binding events determine the various 
functions of STAT3 in ESCs, CD4^ T cells, macrophages 
and AtT-20 cells 

Since the 35 STAT3-binding events shared across all four 
cell types encode a self-regulatory program for STAT3 
that is cell type-independent, the cell type-specific 
STAT3-binding events should be related to the specific 
functions of STAT3 in each of the four distinct ceU 
types. GO analysis of the cell type-specific STAT3- 
binding events using the 'Biological Process' category 
shows that the most over-represented terms in ESCs 
include distinctive ESC functions such as 'stem-cell main- 
tenance' and 'stem-cell differentiation' (Figure 5A). 
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Figure 4. The shared overlap of STAT3 binding in all four cell types forms a cell type-independent regulatory network with MYC and E2F1. 
(A) HOMER was used to generate de novo motifs from the list of 35 STAT3-binding sites common to all four cell types. Motifs resembling STAT3 
or a STAT3 half-site were removed and over-represented motifs were collected and annotated to genes. Interaction networks were constructed by 
interrogating the PPI network for proteins interacting with those representing the enriched motifs. TPs were clustered together by motif similarity 
and coloured by the cluster they belong to: white-colored nodes do not have a representative motif in the databases or do not bind to DNA directly; 
proteins with a bold circle have a inotif enriched in that cell type, while proteins with no bolded circle have no discovered inotif but were linked to 
STAT3 through the PPI network. Proteins in the network were filtered by gene expression and here we present the union of the network in all four 
cell types (the separate networks are presented in Supplementary Figure S6) (B) ChlP-seq data from ESCs (GSE11431) were re-analyzed and binding 
sites overlapping within 400 bp were collected. The heatmap shows the 35 STAT3-binding sites together with the other TPs bound in the vicinity of 
STAT3. (C) We designed primers for 14 STAT3-binding sites shared between all four cell types and performed ChlP-qPCR. Macrophages were 
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the name of the nearest gene. 
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Figure 5. STAT3 biological specificity is found within the cell type-specific lists of genomic binding sites. Gene Ontology term enrichment analysis 
was done using GREAT with default parameters. Over-represented terms displayed here are from the 'Biological Process' category (A) and the 
Mouse genome informatics phenotype::genotype category (B). A significant q-va\ue of 0.05 is represented by a solid black line and a ^-value of 0.01 
by a dotted gray line. 



Likewise, in CD4 T cells and macrophages terms pertain- 
ing to relevant processes were recovered. For AtT-20 cells, 
as a pituitary epithelial cell line, there are several terms 
related to epithelial cell function, including 'cell-cell 
junction organization', 'cellular response to radiation', 
'plasma membrane organization' and particularly 'regula- 
tion of insuhn receptor signahng pathway', indicating the 



role that the pituitary plays in responding to insuhn. The 
over-represented terms from the Mouse Genome 
Database genotypes: :phenotypes (43) are also indicative 
of specific STAT3 functions in the four cellular types, 
including 'abnormal cytokine secretion' (macrophages), 
'abnormal adaptive immunity' (CD4"^ T cells), 'complete 
embryonic lethahty' (ESCs) and 'increased body 
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temperature' (AtT-20 cells, since one of the functions of 
the pituitary is to regulate body temperature) (Figure 5B). 
Collectively, the GO analyses of the cell type-specific 
STAT3-binding events suggest that the functional specifi- 
city of STAT3 in each cell type is contained within the cell 
type-specific lists of STAT3-binding events and not within 
the shared overlap of peaks common to all four cellular 
types, which as we have previously shown has diverse cell 
type-independent functions. 

Non-canonical DNA binding is prevalent across distinct 
cellular types but is not sufficient to explain STAT3's 
divergent functions 

The DNA binding preferences for paralogous classes of 
TF tend to be rather uniform, making it difficult to attri- 
bute biological specificity to DNA base changes in TF- 
binding motifs. A clear example is provided by 
homeodomain TFs, all of which have strikingly similar 
DNA binding preferences despite encompassing all of 
mammahan development (2). The case of STAT3 is par- 
ticular because although STAT3 has a well-defined canon- 
ical motif (TTCnnnGAA), it has nevertheless been 
reported to use a unique variant motif (TTAnnnGAA) 
to regulate the Prdml gene in CD4"^ T cells (12). 

The general motif recovered de novo in each set of cell 
type-specific STAT3-binding events is the prototypical 
STAT3 motif (Figure 6A), which is shared by other 
STAT factors too (44). Since alternative motifs appear 
to be important for STAT3 function, we investigated the 
presence of all possible STAT3 motif variants in the set of 
STAT3 peaks in each cellular type. This was done by 
counting STAT3 motifs where single base pairs were indi- 
vidually mutated and z-scores were derived by comparing 
against randomly selected sets of background sites drawn 
from the respective ChlP-seq control libraries. 
As expected, the STAT3 canonical binding site was 
found in all four cellular types and the shared overlap 
(Figure 6B). The variant TTAnnnGAA previously 
characterized in CD4^ T cells (12) was over-represented 
not only in CD4+ T cells but also in ESCs and AtT-20 cells 
(but not in macrophages). The variant STAT3 motif 
TGCnnnGAA is over-represented in all four cellular 
types, suggesting that this is a common alternative mode 
of DNA binding by STAT3. Finally, the motif 
CTCnnnGAA appears to be a variant used by STAT3 
exclusively in AtT-20 cells (Figure 6B). 

Moreover, we quantified the usage of the various ca- 
nonical and variant motifs for each set of STAT3 peaks 
(Figure 6C). All STAT3 peaks contain putative half-sites 
(TTCC), often several of them per peak. At most 28% of 
STAT3 peaks harbor a perfect canonical site (in ESCs) 
while non-canonical sites are especially prevalent in 
AtT-20 cells (48% of sites). The set of 35 STAT3- 
binding events shared by all four cellular types turned 
out to be especially conservative in the use of canonical 
motifs, with 43% containing a canonical STAT3 motif. 
The binding of STAT3 to variant motifs could be hnked 
to specific functions: for instance, the binding of STAT3 
to canonical TTCnnnGAA might facilitate the recruit- 
ment of generic co-activators, such as p300, whereas the 



binding to the variant motif TTAnnnGAA (as is the case 
in CD4"^ T cells) might lead to the recruitment of specific 
regulatory complexes. The differential recruitment of 
co-activators could be allosterically induced by different 
DNA hgands, as demonstrated for the glucocorticoid 
receptor (45), a model that provides a conceptual frame- 
work for understanding the functional plasticity of pleio- 
tropic TFs such as STAT3. Here, the binding of STAT3 to 
variant DNA motifs induces conformational changes on 
the binding proteins, which in turn affect the recruitment 
of co-activators. Indeed, in the case of STAT3, specific 
base pairs that vary between the canonical and the 
variant motifs TTAnnnGAA and TGCnnnGAA (i.e. 
T[TC] versus T[TA] or T[GC]) are directly contacted by 
Asn466 of the connector domain of STAT3, in which case 
Asn466 must structurally rearrange to accommodate the 
altered chemical environment introduced by these variant 
motifs (Figure 6D) (46). These structural rearrangements 
could translate into global structural changes to the con- 
nector domain as well as the neighboring and C-terniinal 
SH2 domain that mediates interactions with other 
proteins by recognizing phosphotyrosine residues. 
A further effect of specific variant motifs and specific 
co-factor interactions might be the stabilization of the 
TRM and the tethering of STAT3 to DNA. Indeed, 
Husby et al. (47) recently proposed that the two halves 
of the STAT3 homodimer do not make identical base pair 
contacts on a TTAGnGGAA variant motif. We found no 
clear link between the presence of variant motifs and 
specific biological functions, although there were several 
prominent examples including the aforementioned Prdml, 
1117a and I117ra (both of which have nearby STAT3- 
binding sites with the variant motif TTAnnnGAA) and 
Cd28, which is associated with the STAT3 variant motif 
TGCtggGAA. The presence of alternative STAT3-binding 
motifs highlights the usage of variant motifs in regulating 
specific genes, although no clear pattern that explains how 
variant motifs encode the cell type-specific functions of 
STAT3 could be discerned. 

Distinct TRMs determine STAT3's cell type-specific 
functions 

In the absence of a clear contribution by variant DNA 
motifs to explain STAT3's cell type-specific functions, an 
attractive model providing biological specificity is one 
where various factors cluster together in a cooperative 
manner to provide biological specificity (48). This 
'piling-up' of TFs and co-factors has been demonstrated 
by ChlP-seq experiments in ESCs, B cells and blood stem- 
cell precursors (4,5,29). One limitation of this approach, 
however, is that the identities of biologically relevant TFs 
need to be known in advance before performing any 
ChlP-seq experiment that will eventually allow one to re- 
construct a transcriptional regulatory network (49). 

By employing the same data integration and analysis 
techniques that we used to reconstruct the cell type- 
independent TRM regulating the shared overlap of 35 
STAT3-binding events (Figure 4A), we generated models 
for the cell type-specific binding events of STAT3 in 
ESCs, CD4^ T cells, macrophages and AtT-20 cells 
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Figure 6. STAT3 uses alternative (non-canonical) modes of binding to DNA. (A) de novo generated motifs from HOMER are very similar in the six 
lists of STAT3 binding. Motifs were generated from the entire lists of STAT3-binding sites for each category, except for CD4^ T cells where the top 
1000 sites were used (as for the entire list we could only identify a STAT3 half-site). (B) r-score heatmap to show over representation of variant 
STAT3 motifs in the STAT3-binding sites in the various cell types. Variant motifs are presented here as all single base pair mutations of one-half of 
the STAT3 heterodimeric motif (C) Pie charts showing the frequency of the DNA words TTCnnnGAA (canonical STAT3) or a non-canonical 
STAT3 binding: TTAnnnGGA and TGCnnnGGA for ESCs, CD4^ T cells, any two cell types and any three cell types; TGCnnnGAA for macro- 
phages and the shared overlap; and TTAnnnGGA, TGCnnnGGA and CTCnnnGAA for AtT-20 cells, (D) Cartoon representation of the Asn466 
amino acid of STAT3 making contact with the DNA base pairs [PDB entry Ibgl (46)]. 



(Figure 7A-D). We managed to recover the well- 
characterized ESC transcriptional regtilatory network (4) 
comprising 0CT4 (represented here by the motif 
POU2F2), S0X2, ESRRB, KLF4 and SMADl and a 
number of homeodomain proteins that may correspond 
to NANOG (Figure 7A). In addition, we identified 



TEADl, a protein recently shown to be essential to 
maintain ESC pluripotency (50) and also identified in 
human ESCs as a potential key TF (51), and REST, 
whose role is to repress specific genes in ESCs and so 
block differentiation (52). This network is reminiscent of 
the ESC networks determined by mass spectrometry 
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(53,54), although it is only a partial match possibly 
because STAT3 was not used as a bait in the generation 
of those networks. 

The transcriptional regulatory network of CD4"^ T cells 
is much less weU characterized, but we managed to 
identify the only known STAT3-binding partner in the 
CD4+ T cells, IRF4 (12) (Figure 6B). Our model of the 
TRM of CD4"^ T cells is completed by members of the TF 
families AP-1, several ETS-related TFs that are known to 
play important roles in T-cell biology (55) and a diversity 
of other TFs. ATFl is recruited to the Ifng promoter in 
CD4^ T cells and with CREB represses Ifng expression in 
naive CD4+ T ceUs (56). Tantalizingly, STAT3 is also 
bound at the Ifng promoter and may co-operate with 
ATFl to regulate Ifng expression. Finally, we also 
identify multiple GATA factors associated with STAT3 
(Figure 7B), of which GATA3 is known to have 
multiple roles in T-cell development (10). 

In macrophages, several classes of TF were identified 
(Figure 7C), the most important of which are members 
of the AP-1 family (FOS, NFE2L2 and JUN), where 
STAT3 might in turn be regulating the expression of 
FOS and NFE2L2 as it binds very close to their TSS 
and CEBPA, which is known to reprogram pre-B cells 
into macrophages (57). 

The STAT3 transcriptional regulatory network of 
AtT-20 cells is less weh characterized than those of 
ESCs and CD4^ T cells, and our TRM model suggests 
many potential partners for STAT3 in AtT-20 cells 
(Figure 7D). We did not recover any motif matching the 
glucocorticoid receptor-binding site, which although it is 
linked to the function of STAT3 in AtT-20 cells, it 
co-locahzes with STAT3 in only 21% of STAT3-binding 
sites, of which ~50% require the presence of glucocortic- 
oid to co-recruit OR and STAT3 (15). 

Finally, a pairwise correlation among the TF-specific 
binding events derived from ChlP-seq libraries done in 
ESCs, and ah of the STAT3-binding events derived 
likewise (across all four cellular types) show the distinct 
TRMs characteristic of ESCs and T cells (Figure 7E). 
These TRMs contain both the previously reported TRM 
specific to ESCs (OCT4, S0X2, ESRRB and NANOG) 
and the 'MYC-regulated' block (MYC, E2F1). 
Additionally, a new TRM consisting of STAT3, IRF4 
and a more loosely associated GATA3 emerged in 
various T-cell subsets. STAT3 binding in macrophages 
and AtT-20 cells have no other partner TFs from the pub- 
lished ChlP-seq libraries and hence form isolated clusters. 
These results demonstrate that STAT3 forms distinct cell 
type-specific TRMs to execute a diversity of gene expres- 
sion programs. 



DISCUSSION 

In this study, we have investigated the factors that deter- 
mine the functions of STAT3 in various cell types. STAT3 
binding is predominantly ceU type-specific, with just a 
small but significant number (35) of binding sites shared 
among ESCs, CD4"^ T ceUs, macrophages and AtT-20 
cells. This shared overlap appears to be a mode of 



auto-regulation for STAT3 function and targets many 
genes that are all directly up-regulated by STAT3 
binding. Additionally, STAT3 also co-binds with E2F1, 
which is present at the same STAT3-binding sites in at 
least ESCs and macrophages, and we demonstrate that 
at least in macrophages E2F1 is even pre-bound to the 
future STAT3-binding sites. This may explain why so 
many of these STAT3-regulated genes are rapidly 
induced in < 1 h upon stimulation with a cytokine. The 
recruitment of STAT3 to DNA via variant motifs does 
not appear to be a major factor regulating STAT3's func- 
tions across diverse cell types. Nevertheless, a more plaus- 
ible explanation is provided by the assembly of distinct 
TRMs around STAT3 in the four ceU types. We 
produced models of TRMs by integrating TF-binding 
site data, protein-protein interaction and expression 
data that provide an explanation for STAT3's specific bio- 
logical functions across distinct cell types. Using just the 
STAT3-binding events in ESCs, we succeeded in recover- 
ing the ESCs-regulatory network and also propose 
STAT3-based TRMs for CD4+ T cells, macrophages 
and AtT-20 cells. 

The cell type-specific TRM models of STAT3 predict an 
important level of epigenetic regulation, as these contain 
many histone-modifying enzymes, several histone deace- 
tylases (HDAC2, HDAC3) histone acetyltransferases 
(EP300) and remodeling enzymes (SMARCEl, 
SMARCA4, SMARCCl) (Figure 7A-D). Additionally, 
in human ESCs, SIN3A, EP300 and HDAC2 are known 
to be associated with the ESC regulatory network (51), of 
which we identified SIN3A and EP300 (Figure 7A). 
SIN3A has been proposed as a master regulator of 
STAT activity, particularly in guiding STAT3 to the 
Socs3 gene (58). Histone modifications hkely play a 
major role in determining cell type specificity either by 
blocking other STAT3-binding sites from becoming avail- 
able or by pre-niarking sites that STAT3 can be recruited 
to. VaUania et al. (59) computationally predicted ~1.3 
million STAT3-binding sites in the mouse genome. We 
nevertheless know from ChlP-seq experiments that 
STAT3 is binding to a fraction of these sites in distinct 
cell types. Although other members of the STAT family 
that share a very similar DNA-binding motif with STAT3 
(44) might actually be occupying these other sites, in our 
opinion this possibihty does not entirely explain why aU 
those potential binding sites are unavailable to STAT3. 
Therefore, epigenetic regulation is probably a major 
factor controlhng STAT3 transcriptional programs and 
has previously been shown for members of the STAT 
family of TFs (17). It remains to be determined whether 
chromatin is initially available for STAT3 recruitment, 
whether STAT3 binds to inaccessible sites but recruits 
chromatin modifiers to locally open the chromatin or a 
combination of both. 

A number of TFs have been profiled by ChlP-seq in 
multiple cell types and a spectrum is emerging that goes 
from the exclusively ceU type-invariant TFs to other TFs 
that are exclusively cell type-specific and a number of TFs 
that lie in between these two extremes. Among the 
remarkably cell type-invariant TFs is CTCF (60), whose 
primary role appears to be in maintaining the architecture 
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Figure 8. A putative model explaining how STATS can perform both cell type-independent and cell type-specific functions by assembling around distinct 
TRMs. STAT3 binding to the genome occurs in two distinct ways: (i) a cell type-independent mode that is primarily concerned with the regulation of 
STAT3's own activity and (ii) a number of cell type-specific modes that execute distinct transcriptional programmes in various cell types. 



of the genome. However, CTCF can also act as a typical 
cell type-specific TF. REST is another role appears to be 
in maintaining the architecture of the genome. REST is 
another cell type-independent TF with important roles in 
ESCs and neural precursors (52). Among the ceU 
type-specific TFs are SMAD3, which only shares three 
genomic binding sites across mouse ESCs, myotubes and 
pro-B cells and only 1 % of sites appear in two or more ceh 
types (35). Additionally, SMADl and TCF7L2 show only 
a smaU overlap in binding sites in two different human ceU 
hues in the erythroid hneage (61) and TCF7L2 similarly to 
STAT3 shows only a small overlap in common between 
six ceUs hnes (62). MYC lies somewhere in the center of 
the specificity spectrum (60), thus reflecting its dual role as 
a ceU type-specific TF and a 'global' regulator of tran- 
scription potentially regulating ~15% of human genes 
(63). Finally, GATA3 binding was explored in multiple 
CD4^ T-cell subtypes and shown to be predominantly 
cell type-specific even within closely related cells of the 
T-cell hneage (10). Therefore, we could say that both 
GATA3 and STAT3 are predominantly cell type-specific 
although they share a limited number of binding sites that 
are common to all cell types where they have been 
profiled. These observations place GATA3 and STAT3 
between MYC and SMADl /SMAD3/TCF7L2. 



CONCLUSIONS 

Our analyses lead us to propose a dual model for STAT3 
transcriptional regulation: (i) a limited, cell type- 
independent and evolutionarily conserved mode whereby 
STAT3 binds to DNA close to (and may co-operate with) 
MYC and E2F1 (and possibly other factors too) to 
regulate the expression of a set of genes in multiple ceU 
types that regulate STAT3's own signaling activity and (ii) 
a broader set of ceU type-specific binding modes controlled 
by distinct cell type-specific TRMs that form around 
STAT3 to enact cell type-specific functions (Figure 8). 



This dual model of transcriptional regulation by means 
of assembling distinct TRMs around a master TF could 
be a general mechanism to regulate the transcriptional 
programs of other pleiotropic TFs. 
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